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Abstract We consider the problem of inferring the unknown parameters of a 
stochastic biochemical network model from a single measured time-course of the 
concentration of some of the involved species. Such measurements are available, 
e.g., from live-cell fluorescence microscopy in image-based systems biology. In 
addition, fluctuation time-courses from, e.g., fluorescence correlation spectroscopy 
provide additional information about the system dynamics that can be used to more 
robustly infer parameters than when considering only mean concentrations. Esti- 
mating model parameters from a single experimental trajectory enables single-cell 
measurements and quantification of cell-cell variability. We propose a novel com- 
bination of an adaptive Monte Carlo sampler, called Gaussian Adaptation, and ef- 
ficient exact stochastic simulation algorithms that allows parameter identification 
from single stochastic trajectories. We benchmark the proposed method on a linear 
and a non-linear reaction network at steady state and during transient phases. In ad- 
dition, we demonstrate that the present method also provides an ellipsoidal volume 
estimate of the viable part of parameter space and is able to estimate the physical 
volume of the compartment in which the observed reactions take place. 



1 Introduction 



Systems biology implies a holistic research paradigm, complementing the reduc- 
tionist approach to biological organization [16, 15]. This frequently has the goal of 
mechanistically understanding the function of biological entities and processes in 
interaction with the other entities and processes they are linked to or communicate 
with. A formalism to express these links and connections is provided by network 
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models of biological processes [4, 1]. Using concepts from graph theory [26] and 
dynamic systems theory [44], the organization, dynamics, and plasticity of these 
networks can then be studied. 

Systems biology models of molecular reaction networks contain a number of 
parameters. These are the rate constants of the involved reactions and, if spatiotem- 
poral processes are considered, the transport rates, e.g. diffusion constants, of the 
chemical species. In order for the models to be predictive, these parameters need to 
be inferred. The process of inferring them from experimental data is called param- 
eter identification. If in addition also the network structure is to be inferred from 
data, the problem is called systems identification. Here, we consider the problem of 
identifying the parameters of a biochemical reaction network from a single, noisy 
measurement of the concentration time-course of some of the involved species. 
While this time series can be long, ensemble replicas are not possible, either be- 
cause the measurements are destructive or one is interested in variations between 
different specimens or cells. This is particularly important in molecular systems bi- 
ology, where cell-cell variations are of interest or large numbers of experimental 
replica are otherwise not feasible [45]. 

This problem is particularly challenging and traditional genomic and proteomic 
techniques do not provide single-cell resolution. Moreover, in individual cells the 
molecules and chemical reactions can only be observed indirectly. Frequently, fluo- 
rescence microscopy is used to observe biochemical processes in single cells. Fluo- 
rescently tagging some of the species in the network of interest allows measuring the 
spatiotemporal evolution of their concentrations from video microscopy and fluores- 
cence photometry. In addition, fluorescence correlation spectroscopy (FCS) allows 
measuring fluctuation time-courses of molecule numbers [23]. 

Using only a single trajectory of the mean concentrations would hardly allow 
identification of network parameters. There could be several combinations of net- 
work parameters that lead to the same mean dynamics. A stochastic network model, 
however, additionally provides information about the fluctuations of the molecular 
abundances. The hope is that there is then only a small region of parameter space that 
produces the correct behavior of the mean and the correct spectrum of fluctuations 
[31]. Experimentally, fluctuation spectra can be measured at single-cell resolution 
using FCS. 

The stochastic behavior of biochemical reaction networks can be due to low copy 
numbers of the reacting molecules [39, 10]. In addition, biochemical networks may 
exhibit stochasticity due to extrinsic noise. This can persist even at the continuum 
scale, leading to continuous-stochastic models. Extrinsic noise can, e.g., arise from 
environmental variations or variations in how the reactants are delivered into the sys- 
tem. Also measurement uncertainties can be accounted for in the model as extrinsic 
noise, modeling our inability to precisely quantify the experimental observables. 

We model stochastic chemical kinetics using the chemical master equation 
(CME). Using a CME forward model in biological parameter identification amounts 
to tracking the evolution of a probability distribution, rather than just a single 
function. This prohibits predicting the state of the system and only allows state- 
ments about the probability for the system to be in a certain state, hence requiring 
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sampling -based parameter identification methods. In the stochastic-discrete con- 
text, a number of different approaches have been suggested. Boys et al. proposed 
a fully Bayesian approach for parameter estimation using an explicit likelihood for 
data/model comparison and a Markov Chain Monte Carlo (MCMC) scheme for 
sampling [5]. Zechner et al. developed a recursive Bayesian estimation technique 
[46, 45] to cope with cell-cell variability in experimental ensembles. Toni and co- 
workers used an approximate Bayesian computation (ABC) ansatz, as introduced 
by Marjoram and co-workers [25], that does not require an explicit likelihood [43]. 
Instead, sampling is done in a sequential Monte Carlo (or particle filter) framework. 
Reinker et al. used a hidden Markov model where the hidden states are the actual 
molecule abundances and state transitions model chemical reactions [40]. Inspired 
by Prediction Error Methods [24], Cinquemani et al. identified the parameters of a 
hybrid deterministic-stochastic model of gene expression from multiple experimen- 
tal time courses [7]. Randomized optimization algorithms have been used, e.g., by 
Koutroumpas et al. who applied a Genetic Algorithm to a hybrid deterministic- 
stochastic network model [21]. More recently, Poovathingal and Gunawan used 
another global optimization heuristic, the Differential Evolution algorithm [32]. A 
variational approach for stochastic two-state systems is proposed by Stock and co- 
workers based on Maximum Caliber [41], an extension of Jaynes' Maximum En- 
tropy principle [14] to non-equilibrium systems. If estimates are to be made based 
on a single trajectory, the stochasticity of the measurements and of the model leads 
to very noisy similarity measures, requiring optimization and sampling schemes that 
are robust against noise in the data. 

Here, we propose a novel combination of exact stochastic simulations for a 
CME forward model and an adaptive Monte Carlo sampling technique, called 
Gaussian Adaptation, to address the single-trajectory parameter estimation problem 
for monostable stochastic biochemical reaction networks. Evaluations of the CME 
model are done using exact partial-propensity stochastic simulation algorithms [35]. 
Parameter optimization uses Gaussian Adaptation. The method iteratively samples 
model parameters from a multivariate normal distribution and evaluates a suitable 
objective function that measures the distance between the dynamics of the forward 
model output and the experimental measurements. In addition to estimates of the 
kinetic parameters in the network, the present method also provides an ellipsoidal 
volume estimate of the viable part of parameter space and is able to estimate the 
physical volume of the intra-cellular compartment in which the reactions take place. 

We assume that quantitative experimental time series of either a transient or the 
steady state of the concentrations of some of the molecular species in the network 
are available. This can, for example, be obtained from single-cell fluorescence mi- 
croscopy by translating fluorescence intensities to estimated chemical concentra- 
tions. Accurate methods that account for the microscope's point-spread function 
and the camera noise model are available to this end [12, 13, 6]. Additionally, FCS 
spectra can be analyzed in order to quantify molecule populations, their intrinsic 
fluctuations, and lifetimes [23, 34, 39]. The present approach requires only a single 
stochastic trajectory from each cell. Since the forward model is stochastic and only a 
single experimental trajectory is used, the objective function needs to robustly mea- 
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sure closeness between the experimental and the simulated trajectories. We review 
previously considered measures and present a new distance function in Sec. 5. First, 
however, we set out the formal stochastic framework and problem description below. 
We then describe Gaussian Adaptation and its applicability to the current estimation 
task. The evaluation of the forward model is outlined in Sec. 4. We consider a cyclic 
linear chain as well as a non-linear colloidal aggregation model as benchmark test 
cases in Sec. 6 and conclude in Sec. 7. 



2 Background and problem statement 

We consider a network model of a biochemical system given by M coupled chemical 
reactions 

X>,,.s ; 4 -I> ; ; ( s ; v/ = i,...,m (i) 

between N species, where v~ = [v,r.] and V + = [v^] are the stoichiometry matrices 
of the reactants and products, respectively, and S, is the i th species in the reaction 
network. Let n, be the population (molecular copy number) of species S;. The reac- 
tions occur in a physical volume £2 and the macroscopic reaction rate of reaction j 
is kj. This defines a dynamic system with state n(t) = [«;(?)] and M+ 1 parameters 
= [*!,. ..,*«,«]. 

The state of such a system can be interpreted as a realization of a random variable 
n(t ) that changes over time t . All one can know about the system is the probability 
for it to be in a certain state at a certain time t j given the system's state history, hence 

P(n(tj) \n{tj-i), . . . ,n(fi),n(f ))d A '« 

= Prob{n(f j) G [n(tj), n(tj) + dn) | n(tj) , i = 0, . . . , j - 1 } . (2) 

A frequently made model assumption, substantiated by physical reasoning, is 
that the probability of the current state depends solely on the previous state, i.e., 

P(n(tj)\n(tj-i),...,n(ti),n(t )) = P(n(tj)\n(tj-i)) . (3) 

The system is then modeled as a first-order Markov chain where the state n evolves 
as 

n(t + At) = n(t) + E(At; n,t) (4) 

This is the equation of motion of the system. If n is real-valued, it defines a 
continuous-stochastic model in the form of a continuous-state Markov chain. Dis- 
crete n, as is the case in chemical kinetics, amount to discrete-stochastic mod- 
els expressed as discrete-state Markov chains. The Markov propagator S is it- 
self a random variable, distributed with probability distribution TI{E, \At;n,t) = 
P(n + £,,t + At\n,t) for the state change For continuous-state Markov chains, 
n is a continuous probability density function (PDF), for discrete-state Markov 
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chains a discrete probability distribution. If JT(|) = 5(4 — £ ), with 5 the Dirac 
delta distribution, then the system's state evolution becomes deterministic with pre- 
dictable discrete or continuous increments £j . Deterministic models can hence be 
interpreted as a limit case of stochastic models [22]. 

In chemical kinetics, the probability distribution fl of the Markov propagator 
is a linear combination of Poisson distributions with weights given by the reaction 
stoichiometry. This leads to the equation of motion for the population n given by 



n(t+At) = n(t) + (v + - V 



Wm 



(5) 



where ~ £?(ai(n(t))At) is a random variable from the Poisson distribution with 
rate X = aj(n(t))At. The second term on the right-hand side of Eq. 5 follows a 
probability distribution II | At;n, t) whose explicit form is analytically intractable 
in the general case. The rates a/, j = I,.. .M, are called the reaction propensities 
and are defined as 



n 



1+13. 



(6) 



They depend on the macroscopic reaction rates and the reaction volume and can be 
interpreted as the probability rates of the respective reactions. Advancing Eq. 5 with 
a At such that more than one reaction event happens per time step yields an ap- 
proximate simulation of the biochemical network as done in approximate stochastic 
simulation algorithms [9, 3]. 

An alternative approach consists in considering the evolution of the state proba- 
bility distribution P(n,t |«0' f o) of the Markov chain described by Eq. 5, hence: 



dP 

~dt 



M 



(7) 



.7=1 V=l 



with the step operator Eff(n) = f(n + pi) for any function / where i is the JV- 
dimensional unit vector along the i th dimension. This equation is called the chemical 
master equation (CME). Directly solving it for P is analytically intractable, but 
trajectories of the Markov chain governed by the unknown state probability P can 
be sampled using exact stochastic simulation algorithms (SSA) [8]. Exact SSAs 
are exact in the sense that they sample Markov chain realizations from the exact 
solution P of the CME, without ever explicitly computing this solution. Since SSAs 
are Monte Carlo algorithms, however, a sampling error remains. 

Assuming that the population n increases with the volume £2, n can be approx- 
imated as a continuous random variable in the limit of large volumes, and Eq. 5 
becomes 

n(t + At) =n{t) + {y+ -yr) ... , (8) 
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where T], ~ jV{ai{n{t))At, ai{n{t))At) are normally distributed random variables. 
The second term on the right-hand side of this equation is a random variable that is 
distributed according to the corresponding Markov propagator T1(E, \ At;n,t), which 
is a Gaussian. Equation 8 is called the chemical Langevin equation with fl given by 



n(£\At;n,t) = (2n)- N ' 2 \E I^V \ «-M) T r 1 ; 



(9) 



where 

jd = At (v + -V _ ) 



fli(n(f)) 
a M {n(t)) 



and L = At ( V + - V " ) diag (n (t ) ) ( v + - V~ ) 1 



The corresponding equation for the evolution of the state PDF is the non-linear 
Fokker-Planck equation, given by 



where 



U=V T (^V-F)P(».,.. 



(10) 



(11) 



and 



1 f +c ° 

At^Q At J_oo — 



(12) 



At much larger i2, when the population n is on the order of Avogadro's number, 
Eq. 8 can be further approximated as 



n(t + At) =n(t) + (v + - v") 



<^i(n(/))4f 
<p M (n(t))At 



(14) 



where ; -(n) = kjQ. 1 ^'' =1 V, ' J IliLi ( v O0 _1 - Note that the second term on the 
right-hand side of this equation is a random variable whose probability distribution 
is the Dirac delta 



n{£\At\n,t) = 8 4-(v + -v" 



0i(«(f))4f 
<j) M (n(t))At 



(15) 
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Equation 14 hence is a deterministic equation of motion. In the limit At — > this 
equation can be written as the ordinary differential equation 



dx 
df 



= (v + -vr) 



<j>i(x(t)) 

<M*(0) 



(16) 



for the concentration x = n£2 ~ 1 . This is the classical reaction rate equation for the 
system in Eq. 1 . 

By choosing the appropriate probability distribution fl of the Markov propagator, 
one can model reaction networks in different regimes: small population n (small Q) 
using SSA over Eq. 7, intermediate population (intermediate Q.) using Eq. 8, and 
large population (large £2) using Eq. 16. The complete model definition therefore is 

■^(0) = {Y~,Y+,n}. 

The problem considered here can then be formalized as follows: Given a forward 
model ^#(0) and a single noisy trajectory of the population of the chemical species 
n(to + (q— l)zto eX p) at K discrete time points t = to + (q — l)At exp , q = 1,. ..,K, we 
wish to infer 6 = \k\ , . . . ,ku, ^2]. The time between two consecutive measurements 
At eX p and the number of measurements K are given by the experimental technique 
used. As a forward model we use the full CME as given in Eq. 7 and sample trajec- 
tories from it using the partial-propensity formulation of Gillespie's exact SSA as 
described in Sec. 4. 



3 Gaussian Adaptation for global parameter optimization, 
approximate Bayesian computation, and volume estimation 

Gaussian Adaptation (GaA), introduced in the late 1960's by Gregor Kjellstrom 
[17, 19], is a Monte Carlo technique that has originally been developed to solve 
design-centering and optimization problems in analog electric circuit design. De- 
sign centering solves the problem of determining the nominal values (resistances, 
capacitances, etc.) of the components of a circuit such that the circuit output is 
within specified design bounds and is maximally robust against random variations 
in the circuit components with respect to a suitable criterion or objective function. 
This problem is a superset of general optimization, where one is interested in find- 
ing a parameter vector that minimizes (or maximizes) the objective function without 
any additional robustness criterion. GaA has been specifically designed for scenar- 
ios where the objective function f(d) is only available in a black-box (or oracle) 
model that is defined on a real-valued domain CR" and returns real-valued out- 
put K. The black-box model assumes that gradients or higher-order derivatives of 
the objective function may not exist or may not be available, hence including the 
class of discontinuous and noisy functions. The specific black-box function used 
here is presented in Sec. 5. 
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The principle idea behind GaA is the following: Starting from a user-defined 
point in parameter space, GaA explores the space by iteratively sampling single 
parameter vectors from a multivariate Gaussian distribution jY(m,E ! ) whose mean 
and covariance matrix E € R" x " are dynamically adapted based on the infor- 
mation from previously accepted samples. The acceptance criterion depends on the 
specific mode of operation, i.e., whether GaA is used as an optimizer or as a sam- 
pler [28, 27]. Adaptation is performed such as to maximize the entropy of the search 
distribution under the constraint that acceptable search points are found with a pre- 
defined, fixed hitting (success) probability p < 1 [19]. Using the definition of the 

entropy of a multivariate Gaussian distribution ,3^{,jY) = log ^^/(2^e)"det(Z)^ 

shows that this is equivalent to maximizing the determinant of the covariance ma- 
trix E. GaA thus follows Jaynes' Maximum Entropy principle [14]. 

GaA starts by setting the mean m' ' of the multivariate Gaussian to an initial 
acceptable point 9^ and the Cholesky factor of the covariance matrix to the 

identity matrix I. At each iteration g > 0, the covariance E^ is decomposed as: 

^(«) = (r-Qte)^ (r-Q&y = r 2 (2 (i>) )\ where r is the scalar ste P size 

that controls the scale of the search. The matrix is the normalized square root 

of E_( g \ found by eigen- or Cholesky decomposition of E_( g \ The candidate param- 
eter vector in iteration g+ 1 is sampled from a multivariate Gaussian according to 
0(g+i) = +r (g)g{g) J j(g) f w here ?7 te) ~ <sV(0,IJ. The parameter vector is then 

evaluated by the objective function 

Only if the parameter vector is accepted, the following adaptation rules are ap- 
plied: The step size r is increased as A g+l ^ = / e • A g \ where f e > 1 is termed the 
expansion factor. The mean of the proposal distribution is updated as 

= (l-±-) „to + -%(« +1 > . (17) 

N m is a weighting factor that controls the learning rate of the method. The successful 
search direction d [g+l) = -mW) is used to perform a rank-one update of 

the covariance matrix: J^' +1 ) = (l-^j ^ + j^d {g+l) , d {g+l)T '. N c weights the 
influence of the accepted parameter vector on the covariance matrix. In order to 
decouple the volume of the covariance (controlled by A g+l ^) from its orientation, 
Q ig+ V is normalized such that det(g te+1) ) = 1. 

In case is not accepted at the current iteration, only the step size is adapted 

as r (s+ l ) — y c . r (g) f where / c < 1 is the contraction factor. 

The behavior of GaA is controlled by several strategy parameters. Kjellstrom 
analyzed the information-theoretic optimality of the acceptance probability p for 
GaA in general regions [19]. He concluded that the efficiency E of the process and 
p are related as E °< —p\ogp, leading to an optimal p = - w 0.3679, where e is 
Euler's number. A proof is provided in [18]. Maintaining this optimal hitting prob- 
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ability corresponds to leaving the volume of the distribution, measured by det(Z), 
constant under stationary conditions. Since det(Z) = r 2 "det(gg T ), the expansion 

and contraction factors / e and f c expand or contract the volume by a factor of f 2n 
and f 2n , respectively. After S accepted and F rejected samples, a necessary condi- 
tion for constant volume thus is: nf= t (/e ) 2m Fl|L i (/c ) 2 " = 1. Using p = and 
introducing a small j3 > 0, the choice / e = 1 + J3(l — p) and f c = 1 — j3p satisfies 
the constant- volume condition to first order. The scalar rate j3 is coupled to Nq. Afc 
influences the update of E £ R nxn , which contains n 2 entries. Hence, Nq should be 
related to n 2 . We suggested using A^c = (n+ l) 2 /log(n+ 1) as a standard value, and 
coupling j3 = ^ [29]. A similar reasoning is also applied to N m . Since N m influ- 
ences the update of m £ W, it is reasonable to set N m °< n. We propose N m = en as 
a standard value. 

Depending on the specific acceptance rule used, GaA can be turned into a global 
optimizer [29], an adaptive MCMC sampler [28, 27], or a volume estimation method 
[30], as described next. 



3.1 GaA for global black-box optimization 

In a minimization scenario, GaA uses an adaptive-threshold acceptance mechanism. 
Given an initial scalar cutoff threshold Cj \ we accept a parameter vector at 
iteration g+ 1 if /(0^ +1 - > ) < Cj\ Upon acceptance, the threshold cj is lowered as 

= { l ~W^) c t ^ + a)t/(^ (a ' +1) )> where A^t controls the weighting between the 
old threshold and the objective-function value of the accepted sample. This sample- 
dependent threshold update renders the algorithm invariant to linear transformations 
of the objective function. The standard strategy parameter value is A^x = en [28]. 
We refer to [28] for further information about convergence criteria and constraint 
handling techniques in GaA. 

3.2 GaA for approximate Bayesian computation and viable volume 
estimation 

Replacing the threshold acceptance-criterion by a probabilistic Metropolis criterion, 
and setting A^ m = 1, turns GaA into an adaptive MCMC sampler with global adap- 
tive scaling [2]. We termed this method Metropolis-GaA [28, 27]. Its strength is that 
GaA can automatically adapt to the covariance of the target probability distribution 
while maintaining the fixed hitting probability. For standard MCMC, this cannot be 
achieved without fine-tuning the proposal using multiple MCMC runs. We hypothe- 
size that GaA might also be an effective tool for approximate Bayesian computation 
(ABC) [43]. In essence, the ABC ansatz is MCMC without an explicit likelihood 
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function [25]. The likelihood is replaced by a distance function — which plays the 
same role as our objective function — that measures closeness between a parameter- 
ized model simulation and empirical data 2>, or summary statistics thereof. When a 
uniform prior over the parameters and a symmetric proposal are assumed, a parame- 
ter vector in ABC is unconditionally accepted if its corresponding distance function 
value f(9 ig+l) ) < c T [25]. The threshold cj is a problem-dependent constant that 
is fixed prior to the actual computation. Marjoram and co-workers have shown that 
samples obtained in this manner are approximately drawn from the posterior pa- 
rameter distribution given the data 2>. While Pritchard et al. used a simple rejection 
sampler [33], Marjoram and co-workers proposed a standard MCMC scheme [25]. 
Toni and co-workers used sequential MC for sample generation [43]. To the best of 
our knowledge, however, the present work presents the first application of an adap- 
tive MCMC scheme for ABC in biochemical network parameter inference. Finally, 
we emphasize that when GaA's mean, covariance matrix, and hitting probability p 
stabilize during ABC, they provide direct access to an ellipsoidal estimation of the 
volume of the viable parameter space as defined by the threshold cj [30]. Hafner 
and co-workers have shown how to use such viable volume estimates for model 
discrimination [11]. 



4 Evaluation of the forward model 



In each iteration of the GaA algorithm, the forward model of the network needs 
to be evaluated for the proposed parameter vector 0. This requires an efficient and 
exact SSA for the chemical kinetics of the reaction network, used to generate trajec- 
tories n{t) from Ji(Q). Since GaA could well propose parameter vectors that lead 
to low copy numbers for some species, it is important that the SSA be exact since 
approximate algorithms are not appropriate at low copy number. 

In its original formulation, Gillespie's SSA has a computational cost that is 
linearly proportional to the total number M of reactions in the network. If many 
model evaluations are required, as in the present application, this computational cost 
quickly becomes prohibitive. While more efficient formulations of SSA have been 
developed for weakly coupled reaction networks, their computational cost remains 
proportional to M for strongly coupled reaction networks [35]. A reaction network is 
weakly coupled if the number of reactions that are influenced by any other reaction 
is bounded by a constant. If a network contains at least one reaction whose firing 
influences the propensities of a fixed proportion (in the worst case all) of the other 
reactions, then the network is strongly coupled [35]. Scale-free networks as seem to 
be characteristic for systems biology models [1, 42] are by definition strongly cou- 
pled. This is due to the existence of hubs that have a higher connection probability 
than other nodes. These hubs frequently correspond to chemical reactions that pro- 
duce or consume species that also participate in the majority of the other reactions, 
such as water, ATP, or CO2 in metabolic networks. 
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We use partial-propensity methods [35, 36] to simulate trajectories according to 
the solution of the chemical master Eq. 7 of the forward model. Partial-propensity 
methods are exact SSAs whose computational cost scales at most linearly with the 
number N of species in the network [35]. For large networks, this number is usu- 
ally much smaller than the number of reactions. Depending on the network model 
at hand, different partial-propensity methods are available for its efficient simula- 
tion. Strongly coupled networks where the rate constants span only a limited spec- 
trum of values are best simulated with the partial-propensity direct method (PDM) 
[35]. Multi-scale networks where the rate constants span many orders of magni- 
tude are most efficiently simulated using the sorting partial-propensity direct method 
(SPDM) [35]. Weakly coupled reaction networks can be simulated at constant com- 
putational cost using the partial-propensity SSA with composition-rejection sam- 
pling (PSSA-CR) [37]. Lastly, reaction networks that include time delays can be 
exactly simulated using the delay partial-propensity direct method (dPDM) [38]. 
Different combinations of the algorithmic modules of partial-propensity methods 
can be used to constitute all members of this family of SSAs [36]. We refer to the 
original publications for algorithmic details, benchmarks of the computational cost, 
and a proof of exactness of partial-propensity methods. 



5 Objective function 

In the context of parameter identification of stochastic biochemical networks, a 
number of distance or objective functions have previously been suggested. Reinker 
et al. proposed an approximate maximum-likelihood measure under the assumption 
that only a small number of reactions fire between two experimental measurement 
points, and a likelihood based on singular value decomposition that works when 
many reactions occur per time interval [40]. Koutroumpas et al. compared objec- 
tive functions based on least squares, normalized cross-correlations, and conditional 
probabilities using a Genetic Algorithm [21]. Koeppl and co-workers proposed the 
Kantorovich distance to compare experimental and model-based probability distri- 
butions [20]. Alternative distance measures include the Earth Mover's distance or 
the Kolomogorov-Smirnov distance [32]. These distance measures, however, can 
only be used when many experimental trajectories are available. In order to mea- 
sure the distance between a single experimental trajectory n(t) and a single model 
output «(/), we propose a novel cost function f(9_) = f{^(9_), n) that reasonably 
captures the kinetics of a monostable system. We define a compound objective func- 



tion/(0)=/ 1 (0)+/ 2 (0) with 



4 



£f =0 | ACF, (»;)- ACF, («,-)! 
I^ ACF,(n«) 



/i(G) = I>, 



/2(0) = L 



(18) 



i=i 



where 
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tt(";)-Afr(fy ) N 2 



(19) 



with the central moments given by 



, Z^njito + ip-l)^) if /=1 

^ \(\I^=i(nj(to + (q-l)At aip )-n l (nj)y\) l / i otherwise 

and the time-autocorrelation function (ACF) at lag Z given by 

AnTJ , s ^(foH'^O + Mfexp)-!^!^)) 2 

ACF, n; = — — \ . 

The variable z x is the lag at which the experimental ACF crosses for the first time. 
The function f\ (0) measures the difference between the first four moments of n and 
n. This function alone would, however, not be enough to capture the kinetics since 
it lacks information about correlations in time. This is taken into account by 
measuring the difference in the lifetimes of all chemical species. These lifetimes are 
systematically modulated by the volume £2 [39], hence enabling volumetric mea- 
surements of intra-cellular reaction compartments along with the identification of 
the rate constants. 

The present objective function allows inclusion of experimental readouts from 
image-based systems biology. The moment-matching part is a typical readout from 
fluorescence photometry, whereas the autocorrelation of the fluctuations can directly 
be measured using, e.g., FCS. 
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Fig. 1 In silica data for all test cases, a. Time evolution of the populations of three species in the 
cyclic chain model at steady state (starting at = 2000). b. Time evolution of the populations of 
two species in the aggregation model at steady state (starting at to = 5000). c. Same as b, but during 
the transient phase (starting at to = 0). 
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6 Results 

We estimate the unknown parameters for two reaction networks: a weakly coupled 
cyclic chain and a strongly coupled non-linear colloidal aggregation network. For 
the cyclic chain we estimate at steady state. For the aggregation model we estimate 
both at steady state and in the transient phase. Every kinetic parameter is allowed 
to vary in the interval [1CT 3 , 10 3 ] and the reaction volume Q. in [1, 500]. Each GaA 
run starts from a point selected uniformly at random in logarithmic parameter space. 



6.1 Weakly coupled reaction network: cyclic chain 

The cyclic chain network is given by: 



In this linear network, the number of reactions M is equal to the number of species 
N. The maximum degree of coupling of this reaction network is 2, irrespective of the 
size of the system (length of the chain), rendering it weakly coupled [35]. We hence 
use PSSA-CR to evaluate the forward model with a computational complexity in 
0(1) [37]. In the present test case, we limit ourselves to 3 species and 3 reactions, 
i.e., N = M = 3. The parameter vector for this case is given by (9 = [ki,k2,k^\, since 
the kinetics of linear reactions is independent of the volume Q [39]. 

We simulate steady-state "experimental" data n using PSSA-CR with ground 
truth k\ =2, k2 = 1.5, kj, = 3.2 (see Fig. la). We set the initial population of the 
species to n\ (t = 0) = 50, n%{t = 0) = 50, and n^(t = 0) = 50 and sample a single 
CME trajectory at equi-spaced time points with 4f exp = 0.1 between t = to and 
t = t + (K- l)At exp with t = 2000 and K = 1001 for each of the 3 species Si, S 2 , 
and S3. For the generated data we find z x = 7- 

We generate trajectories from the forward model for every parameter vector 
proposed by GaA using PSSA-CR between t = and t = (K - 1 )At exv = 100, start- 
ing from the initial population n,(f = 0) = n,(f = fo)- 

Before turning to the actual parameter identification, we illustrate the topography 
of the objective function landscape for the present example. We fix £3 =3.2 to its 
optimal value and perform a two-dimensional grid sampling for k\ and ki over the 
full search domain. We use 40 logarithmically spaced sample points per parameter, 
resulting in 40 2 parameter combinations. For each combination we evaluate the ob- 
jective function. The resulting landscapes of /i(0), fi{6), an d f(§) are depicted in 
Fig. 2a. Figure 2b shows refined versions around the global optimum. We see that 
the moment-matching term f\ (0) is largely responsible for the global single-funnel 
topology of the landscape. The autocorrelation term fi{(f) sharpens the objective 
function near the global optimum and renders it locally more isotropic. 




i = l,...,JV-l 



i=N . 



(21) 
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We perform both global optimization and ABC runs using GaA. In each of the 15 
independent optimization runs, the number of function evaluations (FES) is limited 
to MAX_FES= 1000M = 3000. We set the initial step size to r(°) = 1 and perform all 
searches in logarithmic scale of the parameters. Independent restarts from uniformly 
random points are performed when the step size r drops below 10~ 4 [29]. For each 
of the 15 independent runs, the 30 parameter vectors with the smallest objective 
function value are collected and displayed in the box plot shown in the left panel of 
Fig. 3a. All 450 collected parameter vectors have objective function values smaller 
than 1.6. The resulting data suggest that the present method is able to accurately 
determine the correct scale of the kinetic parameters from a single experimental 
trajectory, although an overestimation of the rates is apparent. 

We use the obtained optimization results for subsequent ABC runs. We conduct 
15 independent ABC runs using cj = 2. The starting points for the ABC runs are 
selected uniformly at random from the 450 collected parameter vectors in order to 
ensure stable initialization. For each run, we again set MAX_FES= 1000M = 3000. 
The initial step size is set to 0.1, and the parameters are again explored in log- 
arithmic scale. For all runs we observe rapid convergence of the empirical hitting 



a 
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Fig. 2 a. Global objective function landscape for the cyclic chain over the complete search domain 
for optimal kj, = 3.2. The three panels from left to right show f\ (9), fi{&), and f(8), respectively, 
b. A refined view of the global objective function landscape near the global optimum. The three 
panels from left to right show f\ (0), fz{ff), and f(6), respectively. The white dots mark the true, 
optimal parameters. 
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probability p emp to the optimal p = £ (see Sec. 3). We collect the ABC samples 
along with the means and covariances of GaA as soon as |p e mp ~ p\ < 0.05. As an 
example we show the histograms of the posterior samples for a randomly selected 
run in Fig. 3b. The means of the posterior distributions are again larger than the true 
kinetic parameters. Using GaA's means, covariance matrices, and the correspond- 
ing hitting probabilities that generated the posterior samples, we can construct an 
ellipsoidal volume estimation [30]. This is done by multiplying each eigenvalue of 
the average of the collected covariance matrices with c pemf = inv^(p emp ), the n- 
dimensional inverse Chi-square distribution evaluated at the empirical hitting proba- 
bility. The product of these scaled eigenvalues and the volume of the 71-dimensional 

n 

unit sphere, \S(n) \ = r ^ +l y then yields the ellipsoid volume with respect to a uni- 
form distribution (see [30] for details). The resulting ellipsoid contains the optimal 
kinetic parameter vector and is depicted in the right panel of Fig. 3a. It has a volume 
of 0.045 in log-parameter space. This constitutes only 0.0208% of the initial search 
space volume, indicating that GaA significantly narrows down the viable parameter 
space around the true optimal parameters. 



a 



b 




logio fc i lo gio k 2 logio h 



Fig. 3 a. Left panel: Box plot of the 30 best parameter vectors from each of the 15 independent 
optimization runs. The blue dots mark the true parameter values. Right panel: Ellipsoidal volume 
estimation of the parameter space below an objective-function threshold cj = 2 from a single ABC 
run. b. Empirical posterior distributions of the kinetic parameters from the same single ABC run 
with cj = 2. The red lines indicate the true parameters. 
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6.2 Strongly coupled reaction network: colloidal aggregation 



The colloidal aggregation network is given by: 
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(22) 

For this network of N species, the number of reactions is M = ^- +N +1. The 
maximum degree of coupling of this reaction network is proportional to N, ren- 
dering the network strongly coupled [35]. We hence use SPDM to evaluate the 
forward model with a computational complexity in 0(N) [35]. We use SPDM in- 
stead of PDM since the search path of GaA is unpredictable and could well gen- 
erate parameters that lead to multi-scale networks. For this test case, we limit our- 
selves to two species, i.e., N = 2 and M = 5. The parameter vector for this case is 
= [*n,ifcii,*F,*f,*f,12]. 

We perform GaA global optimization runs following the same protocol as for the 
cyclic chain network with MAX_FES = 1000(M + 1) = 6000. 



6.2.1 At steady state 

We simulate "experimental" data n using SPDM with ground truth k\\ = 0.1, fen = 
1.0, kf = 2.1, kf = 0.01, kf = 0.1, and Q = 15 (see Fig. lb). We set the initial 
population of the species to n\(t = 0) = 0, n%(f = 0) = 0, and nj(t = 0) = and 
sample K = 1001 equi-spaced data points between t = to and t = to + (K— l)At exp 
with to = 5000 and At exp =0.1. 

We generate trajectories from the forward model for every parameter vector 
proposed by GaA using SPDM between t = and t = (K — l)At exp = 100, stating 
from the initial population n,(f = 0) = n,(f = to). 

The optimization results are summarized in the left panel of Fig. 4a. For each 
of 15 independent runs, the 30 lowest-objective parameters are collected and shown 
in the box plot. We observe that the true parameters corresponding to O2 =ku, 
3 = fe° n , 4 = kf, and 5 = kf are between the 25 th and 75 th percentiles of the 
identified parameters. Both the first parameter and the reaction volume are, on av- 
erage, overestimated. Upon rescaling the kinetic rate constants with the estimated 
volume, we find norm = [0j /0 6j 2 , 03 06, 04, 0s], which are the specific probability 
rates of the reactions. The identified values are shown in the right panel of Fig. 4a. 
The median of the identified 0" orm coincides with the true specific probability rate. 
Likewise, Q^ orm is closer to the 25 th percentile of the parameter distribution. This 
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suggests a better estimation performance of GaA in the space of specific probability 
rates, at the expense of not obtaining an estimate of the reaction volume. 



6.2.2 In the transient phase 

We simulate "experimental" data in the transient phase of the network dynamics 
using the same parameters as above between t — to and t = (K— l)4f exp with to = 
0, At exp = 0.1, and K = 1001 (see Fig. lc). We evaluate the forward model with 
«,(f = 0) = n,(f = to) to obtain trajectories between t = and t = (K — l)4f exp for 
every proposed parameter vector 0. 

The optimization results for the transient case are summarized in Fig. 4b. We 
observe that the true parameters corresponding to 63 = k° n , 85 — fc? , and 0(, = £2 
are between the 25 th and 75 th percentiles of the identified parameters. The remaining 
parameters are, on average, overestimated. In the space of rescaled parameters noim 
we do not observe a significant improvement of the estimation. 



a 
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Fig. 4 a. Left panel: Box plot of the 30 best parameter vectors from each of the 15 independent 
optimization runs for the steady-state data set. Right panel: Box plots of the normalized parameters 
(see main text for details), b. Left panel: Box plot of the 30 best parameter vectors from each of the 
15 independent optimization runs for the transient data set. Right panel: Box plot of the normalized 
parameters (see main text for details). The blue dots indicate the true parameter values. 
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7 Conclusions and Discussion 

We have considered parameter estimation of stochastic biochemical networks from 
single experimental trajectories. Parameter identification from single time series is 
desirable in image-based systems biology, where per-cell estimates of the fluores- 
cence evolution and its fluctuations are available. This enables quantifying cell-cell 
variability on the level of network parameters. The histogram of the parameters 
identified for different cells provides a biologically meaningful way of assessing 
phenotypic variability beyond simple differences in the fluorescence levels. 

We have proposed a novel combination of a flexible Monte Carlo method, the 
Gaussian Adaptation (GaA) algorithm, and efficient exact stochastic simulation al- 
gorithms, the partial-propensity methods. The presented method can be used for 
global parameter optimization, approximate Bayesian inference under a uniform 
prior, and volume estimation of the viable parameter space. We have introduced an 
objective function that measures closeness between a single experimental trajectory 
and a single trajectory generated by the forward model. The objective function com- 
prises a moment-matching and a time-autocorrelation part. This allows including 
experimental readouts from, e.g., fluorescence photometry and fluorescence corre- 
lation spectroscopy. 

We have applied the method to estimate the parameters of two monostable re- 
action networks from a single simulated temporal trajectory each, both at steady 
state and during transient phases. We considered the linear cyclic chain network and 
a non-linear colloidal aggregation network. For the linear model we were able to 
robustly identify a small region of parameter space containing the true kinetic pa- 
rameters. In the non-linear aggregation model, we could identify several parameter 
vectors that fit the simulated experimental data well. There are two possible reasons 
for this reduced parameter identifiability: either GaA cannot find the globally op- 
timal region of parameter space due to high ruggedness and noise in the objective 
function, or the non-linearity of the aggregation network modulates the kinetics in 
a non-trivial way [39, 10]. Both cases are not accounted for in the current objective 
function, thus leading to reduced performance for non-linear reaction networks. 

We also used GaA as an adaptive MCMC method for approximate Bayesian in- 
ference of the posterior parameter distributions in the linear chain network. This en- 
abled estimating the volume of the viable parameter space below a given objective- 
function value threshold. We found these volume estimates to be stable across in- 
dependent runs. We thus believe that GaA might be a useful tool for exploring the 
parameter spaces of stochastic systems. 

Future work will include (i) alternative objective functions that include tempo- 
ral cross-correlations between species and the derivative of the autocorrelation; (ii) 
longer experimental trajectories; (iii) multi-stable and oscillatory systems; and (iv) 
alternative global optimization schemes. Moreover, the applicability of the present 
method to large-scale, non-linear biochemical networks and real-world experimen- 
tal data will be tested in future work. 
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